Road-testing the English Resource Grammar Over the British National Corpus
نویسندگان
چکیده
This paper addresses two questions: (1) when a large deep processing resource developed for relatively closed domains is run over open text, what coverage does it have, and (2) what are the most effective and time-efficient ways of consolidating gaps in the coverage of
منابع مشابه
Identification of Verb-Particle Constructions in English
We propose different syntax-based methods for automatically identifying verb-particle constructions in English. The methods are based on the Deterministic Finitestate Automaton (DFA), Hidden Markov Model(HMM), and Synchronous ContextFree Grammar (SCFG). Our experiments show that the methods could result in F-score 83.3% over our manually annotated test-set consisting of Wikipedia articles and B...
متن کاملThe Influence of Prosody and Ambiguity on English Relativization Strategies
We present evidence that, for English, ambiguity is an active factor in the choice of relativization strategy and that, in speech, prosody plays a role in resolution of ambiguity over the internal role of the relativized constituent. The evidence is based on (semi-)automatic analysis and comparison of automatically-parsed written and spoken portions of the British National Corpus (BNC, Leech, 1...
متن کاملThe Syntactically Annotated ICE Corpus and the Automatic Induction of a Formal Grammar
The International Corpus of English is a corpus of national and regional varieties of English. The mega-word British component has been constructed, grammatically tagged, and syntactically parsed. This article is a description of work that aims at the automatic induction of a wide-coverage grammar from this corpus as well as an empirical evaluation of the grammar. It first of all describes the ...
متن کاملThe American National Corpus: More Than the Web Can Provide
The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by t...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004